Unsupervised multilingual learning

نویسنده

Benjamin Snyder

چکیده

For centuries, scholars have explored the deep links among human languages. In this thesis, we present a class of probabilistic models that exploit these links as a form of naturally occurring supervision. These models allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Besides these traditional NLP tasks, we also present a multilingual model for lost language decipherment. We test this model on the ancient Ugaritic language. Our results show that we can automatically uncover much of the historical relationship between Ugaritic and Biblical Hebrew, a known related language. Thesis Supervisor: Regina Barzilay Title: Associate Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of supervised and unsupervised learning systems for multilingual text categorization

Due to the availability of a huge amount of textual data from a variety of sources, users of internationally distributed information regions need effective methods and tools that enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing o...

متن کامل

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network– inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of embeddings that exhibit higher accuracy on syntactic and semantic compositionality, as well as multilingual semantic similarity, compared to previous models trai...

متن کامل

Unsupervised Multilingual Learning for POS Tagging

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while ...

متن کامل

Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of langua...

متن کامل

Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach

We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across languages through the use of multilingua...

متن کامل

Unsupervised Multilingual Word Sense Disambiguation via an Interlingua

We present an unsupervised method for resolving word sense ambiguities in one language by using statistical evidence assembled from other languages. It is crucial for this approach that texts are mapped into a language-independent interlingual representation. We also show that the coverage and accuracy resulting from multilingual sources outperform analyses where only monolingual training data ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Unsupervised multilingual learning

نویسنده

چکیده

منابع مشابه

Construction of supervised and unsupervised learning systems for multilingual text categorization

Multilingual Word Embeddings using Multigraphs

Unsupervised Multilingual Learning for POS Tagging

Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach

Unsupervised Multilingual Word Sense Disambiguation via an Interlingua

عنوان ژورنال:

اشتراک گذاری